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Abstract — This paper considers the entropy of the sum of 
(possibly dependent and non-identically distributed) Bernoulli 
random variables. Upper bounds on the error that follows 
from an approximation of this entropy by the entropy of a 
Poisson random variable with the same mean are derived. The 
derivation of these bounds combines elements of information 
theory with the Chen-Stein method for Poisson approximation. 
The resulting bounds are easy to compute, and their applicability 
is exemplified. This conference paper presents in part the first 
half of the paper entitled "An information-theoretic perspective 
of the Poisson approximation via the Chen-Stein method" (see: 
http://arxiv.org/abs/1206.6811l. A generalization of the bounds 
that considers the accuracy of the Poisson approximation for the 
entropy of a sum of non-negative, integer-valued and bounded 
random variables is introduced in the full paper. It also derives 
lower bounds on the total variation distance, relative entropy and 
other measures that are not considered in this conference paper. 

Index Terms — Chen-Stein method, entropy, information theory, 
Poisson approximation, total variation distance. 

I. Introduction 

Convergence to the Poisson distribution, for the number of 
occurrences of possibly dependent events, naturally arises in 
various applications. Following the work of Poisson, there has 
been considerable interest in how well the Poisson distribution 
approximates the binomial distribution. This approximation 
was treated by a limit theorem in |fT3l Chapter 8], and later 
some non-asymptotic theoretical results have studied the ac- 
curacy of this approximation. The Poisson approximation and 
later the compound Poisson approximation have been treated 
extensively in the probability and statistics literature (see, e.g., 
ISl-llini, Uni-IQJJ, L26J-L34J and references therein). 

Among modern methods, the Chen- Stein method forms 
a powerful probabilistic tool that is used to calculate error 
bounds when the Poisson approximation serves to assess the 
distribution of a sum of (possibly dependent) Bernoulli random 
variables 110). This method is based on the simple property 
of the Poisson distribution where Z ^ Po(A) with A G (0, oo) 
if and only if AE[/(Z + 1)] - E[Z /(Z)] = for all bounded 
functions / that are defined on Nq = {0, 1, . . .}. This method 
provides a rigorous analytical treatment, via error bounds, to 
the case where W has approximately a Poisson distribution 
Po(A) so it is expected that A E[f{W + 1)] - E[W f{W)] w 
for an arbitrary bounded function / that is defined on Nq. 
The reader is referred to some nice surveys on the Chen-Stein 
method in S, iH, Chapter 2], Q, IIH Chapter 2], fWl. 



During the last decade, information-theoretic methods were 
exploited to establish convergence to Poisson and compound 
Poisson limits in suitable paradigms. An information-theoretic 
study of the convergence rate of the binomial-to-Poisson 
distribution, in terms of the relative entropy between the 
binomial and Poisson distributions, was provided in ifTsll . 
and maximum entropy results for the binomial, Poisson and 
compound Poisson distributions were studied in lfT4l . |fT9]| . 
El, US, ||33, 1361 and ET). The law of small numbers refers 
to the phenomenon that, for random variables {XiY^L^ on No, 
the sum X^iLi -^i approximately Poisson distributed with 
mean A = J27=iPi ^'^^S. (quahtatively) the following 
conditions hold; 

. F{Xi = 0) w 1, and V{X, = 1) is uniformly small, 

• P{Xi > 1) is neghgible as compared to P(Xj = 1), 

• {Xi}f^^ are weakly dependent. 

An information-theoretic study of the law of small numbers 
was provided in ll24ll via the derivation of upper bounds 
on the relative entropy between the distribution of the sum 
of possibly dependent Bernoulli random variables and the 
Poisson distribution with the same mean. An extension of the 
law of small numbers to a thinning limit theorem for convo- 
lutions of discrete distributions that are defined on Nq was 
introduced in |16 | followed by an analysis of the convergence 
rate and some non-asymptotic results. Further work in this 
direction was studied in |21j, and the work in tU provides 
an information-theoretic study for the problem of compound 
Poisson approximation, which parallels the earlier study for 
the Poisson approximation in L24J . Nice surveys on this line of 
work are provided in fTll Chapter 7], 1251, and |[l2l Chapter 2] 
surveys some commonly-used metrics between probability 
measures with some pointers to the Poisson approximation. 

This paper provides an information-theoretic study of Pois- 
son approximation, and it combines elements of information 
theory with the Chen-Stein method. The novelty in this paper, 
in comparison to previous related works, is related to the 
derivation of upper bounds on the error that follows from an 
approximation of the entropy of a sum of possibly dependent 
and non-identically distributed Bernoulli random variables by 
the entropy of a Poisson random variable with the same mean 
(see Theorem |5] and some of its consequences in Section Ull). 
The use of these new bounds is exemplified, partially relying 
on interesting applications of the Chen-Stein method from |3J. 



II. Error Bounds on the Entropy of the Sum of 
Bernoulli Random Variables 

This section considers the entropy of a sum of (possibly 
dependent and non-identically distributed) Bernoulli random 
variables. Section III-AI provides a review of some known 
results on the Poisson approximation, via the Chen-Stein 
method, that are relevant to the derivation of the new bounds 
(see 13TI Section 2]). Section III-BI introduces explicit upper 
bounds on the error that follows from the approximation of the 
entropy of a sum of Bernoulli random variables by the entropy 
of a Poisson random variable with the same mean. Some 
applications of the new bounds are exemplified in Section lTl-CI 

A. Background 

In the following, the term 'distribution' refers to the prob- 
ability mass function of an integer-valued random variable. 

Definition 1: Let P and Q be two probability measures 
defined on a set A". Then, the total variation distance between 
P and Q is defined by 



dTv(P,Q)= sup \P{A)-Q{A)\ 



(1) 



Borel ACAT 



where the supermum is taken w.r.t. all the Borel subsets A of 
X. If A" is a countable set then ([TJ is simplified to 

l^-Qlli 

> I f I :/: I — I-/ 1 :/: p I — - 



Qix)\ 



(2) 



so the total variation distance is equal to one-half of the Li- 
distance between the two probability distributions. 

The following theorem combines |6 Theorems 1 and 2], 
and its proof relies on the Chen-Stein method: 

Theorem 1: Let W = -^i ^ of inde- 

pendent Bernoulli random variables with E(Xi) — pi for 
i e {!,..., n}, and E(W^) = A. Then, the total variation 
distance between the probability distribution of W and the 
Poisson distribution with mean A satisfies 



1 - e 



-A 



1 1 

^(l^ a) Ep'^'^tv(Pi^,Po(A))< 

(3) 

where a /\b — min{a, h] for every a, 6 g R. 

Remark 1: The ratio between the upper and lower bounds 
in Theorem[T]is not larger than 32, irrespectively of the values 
of {pi}. This shows that these bounds are essentially tight. The 
upper bound in Q improves Le Cam's inequality (see ||26| . 
|[34|)) which states that rfxvl^V, Po(A)) < Yl'LiPf the 
improvement, for A ^ 1, is by the factor j. 

Theorem[T]provides a non-asymptotic result for the Poisson 
approximation of sums of independent binary random vari- 
ables via the use of the Chen-Stein method. In general, this 
method enables to analyze the Poisson approximation for sums 
of dependent random variables. To this end, the following 
notation was used in |2| and |3|: 

Let / be a countable index set, and for a e /, let be a 
Bernoulli random variable with 



p„ = V{Xa = !) = !- P(X„ = 0) > 0. 



Let 



(4) 



(5) 



ael 



where it is assumed that A G (0, oo). For every a E I, let Ba 
be a subset of / that is chosen such that a G Ba- This subset 
is interpreted in |2| as the neighborhood of dependence for a 
in the sense that Xa is independent or weakly dependent of all 
of the Xp for (3 ^ Ba- Furthermore, the following coefficients 
were defined in 12] Section 2]: 



PaPp 



E E 

ael a^peBa 



Pa, (3, 



Pa,l3 ^ ^{XaX/i 



E\E{Xa~Pa\<j{{Xp})p 



(6) 



(7) 



(8) 



where ct( ) in the conditioning of (|8]l denotes the cr-algebra that 
is generated by the random variables inside the parenthesis. In 
the following, we cite [2 Theorem 1] which essentially implies 
that when 61,62 and 63 are all small, then the total number 
W of events is approximately Poisson distributed. 

Theorem 2: Let W — J2aei ^ °f (possibly 

dependent and non-identically distributed) Bernoulli random 
variables {Xa}a£i- Then, with the notation in (|4ll-(|8]l, the 
following upper bound on the total variation distance holds: 



dTv(-Pw',Po(A)) < (61+62) 



1 - e 

a" 



-A 



1.4\ 



(!>) 



Remark 2: A comparison of the right-hand side of (|9]l with 
the bound in [T Theorem 1] shows a difference in a factor 
of 2 between the two upper bounds. This follows from a 
difference in a factor of 2 between the two definitions of the 
total variation distance in |2 Section 2] and Definition [T] here. 
Note however that Definition [T] is consistent with, e.g., [6]. 

Remark 3: Theorem |2] forms a generalization of the upper 
bound in Theorem [T] by choosing Ba = a for a E I = 
{1, . . . ,n} (note that, due to the independence assumption of 
the Bernoulli random variables in Theorem [T] the neighbor- 
hood of dependence of a is a itself). In this setting, under 
the independence assumption, 61 = J27=iPh 62 = 63 = 
which therefore gives, from (|9]l, the upper bound in (|3]i. 

The following inequality holds (see ifTTl Theorem 17.3.3]): 

Theorem 3: Let P and Q be two probability mass functions 
on a finite set X such that the Li norm of their difference is 
not larger than one-half, i.e.. 



l^-Qlli = El^(^)-Q(^)l^ J' 



(10) 



Then the difference between their entropies satisfies 



\HiP)~H{Q)\<-\\P-Q\U log(^^-Mk). (11) 



The bounds on the total variation distance for the Poisson 
approximation (see Theorems [T| and |2]i and the Li bound on 
the entropy (see Theorem [3]) motivate to derive a bound on 
\H{W) - H{Z)\ where W = J^aei ^ ^"it^ °f 

(possibly dependent and non-identically distributed) Bernoulli 
random variables, and Z ^ Po(A) is Poisson distributed 
with mean A ~ X)ae/Pa- The problem is that the Poisson 
distribution is defined on a countable set that is infinite, so 
the bound in Theorem [3] is not applicable for the considered 
problem of Poisson approximation. This motivates the theorem 
in the next sub-section. Before proceeding to this analysis, the 
following maximum entropy result of the Poisson distribution 
is introduced for the special case where the Bernoulli random 
variables are independent. This maximum entropy result fol- 
lows directly from |14, Theorems 7 and 8]. 

Theorem 4: The Poisson distribution Po(A) has the maxi- 
mal entropy among all probability distributions with mean A 
that can be obtained as sums of independent Bernoulli RVs: 



i7(Po(A)) = sup H{S) 
Boo (A) = U B„(A) 

neN 



(12) 



B. New Error Bounds on the Entropy 

We introduce here new error bounds on the entropy of 
Bernoulli sums. Due to space limitations, the proofs are 
omitted. The proofs are available in the full paper version 
(see |31, Section II.D]). 

Theorem 5: Let / be an arbitrary finite index set with ni = 
\I\. Under the assumptions of Theorem|2]and the notation used 
in Eqs. (gll-®, let 



a(A) = 2 



ih + 62) 



1 -e- 



(Alog(|))^+A2 + 



61og(27r) + 1 



(15) 



exp 



A + (to — 1) log 



12 

TO — 1 

Ae 



(16) 



where, in ( fTSI l. {x)+ = ma,x{x, 0} for every x e M. Let Z ^ 
Po(A) be a Poisson random variable with mean A. If a(A) < i 
and A = ^^eiPa < rn ~ 1, then the difference between the 
entropies (to the base e) of Z and W satisfies the inequality: 

\HiZ) H{W)\ < a{X) log + bW- (17) 



B„(A) ^IS: S = Y,X^^ Bern(K), Y.P^ = ^\ 

{ 1=1 i=l ) 

where in the above sum, {XiY^^^ are independent Bernoulli 
random variables. Furthermore, since the supremum of the 
entropy over the set Bn{\) is monotonic increasing in n, then 

iJ(Po(A)) = lim sup H{S). 

For 71 G N, the maximum entropy distribution in the class 
Bn{X) is the Binomial distribution of the sum of n i.i.d. 
Bernoulli random variables Ber ( - ) , so 



iJ(Po(A)) = lim (Binomial fn, - 



Calculation of the entropy of a Poisson random variable: 
In the next sub-section we consider the approximation of the 
entropy of a sum of Bernoulli random variables by the entropy 
of a Poisson random variable with the same mean. To this 
end, it is required to evaluate the entropy of Z ^ Po(A). It is 
straightforward to verify that 



H{Z) = \\og ( - 



^ A'^e^^logfc! 



k\ 



(13) 



so the entropy of the Poisson distribution (in nats) is ex- 
pressed in terms of an infinite series that has no closed 
form. Sequences of simple upper and lower bounds on this 
entropy, which are asymptotically tight, were derived in |[T|. 
In particular, for large values of A, 



H{Z) w - log(27reA) 



1 

12A 



1 

24A2' 



(14) 



The following corollary follows from Theorems |4] and |5] 
and Remark [3] 

Corollary 1: Consider the setting in Theorem |5] and as- 
sume that the Bernoulli random variables {Xa}aei ^re also 
independent. If — ^ Tliai^iPa — \ ^^'^ A < to — 1 then, 
for Z - Po(A), 

< H{Z) - H{W) < 6(A) + 




The following bound forms a possible improvement of the 
result in Corollary [T] It combines the upper bound on the total 
variation distance in ||6l Theorem 1] (see Theorem [T] here) with 
the upper bound on the total variation distance in |8 Eq. (30)]. 
It is noted that the bound in [8, Eq. (30)] improves the bound 
in 1221 Eq. (10)] (see also EH Eq. (4)]). 

Proposition 1: Assume that the conditions in Corollary [T] 
are satisfied. Then, the following inequality holds: 

/to + 2\ 

< HiZ) - H{W) < g{p) log —V + KA) (19) 



9{p) 



if g{p) < \ and A < m — 1, where 
g{p) =26 minil-e"^, 



(20) 



4e(l - 

P-{Pa}ceV A-X^P" ^^^^ 



ael 



(22) 



Remark 4: From (ISTT i and (l22l i. it follows that 



< 



< max Pa =Pmax- 



Furthermore, the condition A < 771 — 1 is mild since |/| = ?7i 
and the probabilities {pa}aei should be typically small for 
the Poisson approximation to hold. 

Remark 5: Proposition[T]improves the bound in Corollary[T] 
only if 6 is below a certain value that depends on A. The 
maximal improvement that is obtained by Proposition [T] as 
compared to Corollary [T] is in the case where 9 and 
A 00, and the corresponding improvement in the value of 
g{p) is by a factor of ^ w 0.276. 

C. Some Applications of the New Error Bounds on the Entropy 

In the following, the use of Theorem |5] is first exemplified 
when the Bernoulli random variables are independent. It is also 
exemplified in a case from |2, Section 3] where dependence 
among the Bernoulli random variables exists. The use of 
Theorem |5] is exemplified for the calculation of error bounds 
on the entropy via the Chen-Stein method. 

Example 1 (sums of independent binary random variables): 
Let W — X]r=i -^i ^ ^^'^ °f " independent Bernoulli 
random variables where Xi ^ Bern(pi) for i — l,...,r7. 
The calculation of the entropy of W involves the numerical 
computation of the probabilities 

(-Pw(0),-Pw(l),...,Pw(")) = . .*(l-p„,p„) 

whose computational complexity is high for very large values 
of n, especially if the probabilities pi, . . . ,pn are not the 
same. The bounds in Corollary [1] and Proposition [T] enable 
to get rigorous upper bounds on the accuracy of the Poisson 
approximation for H{W). As was explained earlier in this 
section, the bound in Proposition [T] may only improve the 
bound in Corollary [T] Lets exemplify this in the following 
case: Suppose that 



p, ^ 2ai, Vi e {!,..., n}, a = 10"^°, n = 10^ 



then 



A = > _Pi = an{n + 1) 



/ ^ 1 



1,000,000.01 w 10^ 



1 " 



2a(2n+ 1) 



A 



0.0133. 



(23) 



(24) 



The enti-opy of Z - Po(A) is H{Z) = 8.327 nats. Corollai-y □ 
gives that < H{Z) - H{W) < 0.588 nats and Proposition □ 
improves it to < H{Z) - H(W) < 0.205 nats. Hence, 
H{W) ss 8.224 nats with a relative error of at most 1.2%. 
We note that by changing the values of a and n to 10^ 
and 10^^, respectively, it follows that H{W) « 12.932 nats 
with a relative error of at most 0.04%. The enhancement of 
the accuracy of the Poisson approximation in the latter case is 
consistent with the law of small numbers (see, e.g., fi24l and 
references therein). 



Example 2 (random graphs): This problem, which appears 
in |T, Example 1], is described as follows: On the cube {0, 1}", 
assume that each of the n2"^^ edges is assigned a random 
direction by tossing a fair coin. Let k E {0,1,..., n} be 
fixed, and denote hy W — W{k, n) the random variable that 
is equal to the number of vertices at which exactly k edges 
point outward (so fc = corresponds to the event where all n 
edges, from a certain vertex, point inward). Let / be the set of 
all 2" vertices, and Xa be the indicator that vertex a E I has 
exactly k of its edges directed outward. Then W ~ J2aei 
with 

, k f 



Xa ^ Bern(p), p — 2 



Va E I. 



This implies that A = (l) (since |/| = 2"). Clearly, the 
neighborhood of dependence of a vertex a E I, denoted by 
Ba, is the set of vertices that are directly connected to a 
(including a itself since Theorem |2] requires that a E Ba)- It 
is noted, however, that Ba in |2, Example 1] was given by 
Ba — {f3 '■ \(3 — a\ = 1} so it excluded the vertex a. From 
(|6]l, this difference implies that bi in their example should be 
modified to 



bi = 2-"{n + l) 



(25) 



so &i is larger than its value in p. 14] by a factor of 1 + i 
which has a negligible effect if tt. ^ 1. As is noted in ||2] 
p. 14], if a and /? are two vertices that are connected by an 
edge, then a conditioning on the direction of this edge gives 
that 



Pa,p = E{XaXp) 



n — 1 
k 



n — 1 
k - 1 



for every a E I and /3 E Ba \{a], and therefore, from (|7]l. 



n2^ 



k 



n- 1 
fc- 1 



Finally, as is noted in f2}, Example 1], 63 = (this is 
because the conditional expectation of Xa given (-'^^)^g7\B„ 
is, similarly to the un-conditional expectation, equal to pa', 
i.e., the directions of the edges outside the neighborhood of 
dependence of a are irrelevant to the directions of the edges 
connecting the vertex a). 

In the following. Theorem |5] is applied to get a rigorous 
error bound on the Poisson approximation of the entropy 
H{W). Table Upresents numerical results for the approximated 
value of H{W), and an upper bound on the maximal relative 
error that is associated with this approximation. Note that, 
by symmetry, the cases with VF(fc, n) and W{n — k,n) are 
equivalent, so H{W{k, n)) = H(W{n — fc, n)) . 

D. Generalization: Bounds on the Entropy for a Sum of Non- 
Negative, Integer-Valued and Bounded Random Variables 

We introduce in [31 Section II-E] a generalization of the 
bounds in Section III-BI that considers the accuracy of the 
Poisson approximation for the entropy of a sum of non- 
negative, integer-valued and bounded random variables. 



TABLE I 

Numerical results for the Poisson approximations of the 

ENTROPY H(W) (W = W{k, n)) BY the ENTROPY H{Z) WHERE 
Z ~ PO(A), JOINTLY WITH THE ASSOCIATED ERROR BOUNDS OF THESE 
APPROXIMATIONS. THESE ERROR BOUNDS ARE CALCULATED FROM 
THEOREm[5]fOR the RANDOM GRAPH PROBLEM IN EXAMPLe[2] 



n 


k 


A = 




H{W) Si 


Maximal relative eiTor 


30 


27 


4.060 


■ 10^ 


5.573 nats 


0.16% 


30 


26 


2.741 


■ 10" 


6.528 nats 


0.94% 


30 


25 


1.425 


■ 10^ 


7.353 nats 


4.33% 


50 


48 


1.225 


■ 10^ 


4.974 nats 


1.5 ■ 10-s 


50 


44 


1.589 


■10^ 


9.710 nats 


1.0 ■ 10-5 


50 


40 


1.027 


lOlO 


12.945 nats 


4.8 ■ 10-=^ 


100 


95 


7.529 


•10^ 


10.487 nats 


1.6 ■ 10-1^ 


100 


85 


2.533 


lOi^ 


21.456 nats 


2.6 ■ 10-"' 


100 


75 


2.425 




28.342 nats 


1.9 ■ 10-"' 


100 


70 


2.937 


1025 


30.740 nats 


2.1% 



This generalization is enabled via the combination of the 
proof of Theorem |5] for sums of Bernoulli random variables 
with the approach of Serfling in 1321 Section 7], 
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